Predict Bank Customer Churning

Exploratory Data Analysis

More samples of female in the dataset compared to males

The distribution of Dependent counts is fairly normally

We have about 30 percentage of customer unknown eduaction

About half of the customers are married and about 7 % customers are divorced

We have a low kurtosis value pointing to a very flat shaped distribution (as shown in the plots above as well), meaning we cannot assume normality of the feature.

The distribution of the total number of products held by the customer seems closer to a uniform distribution and may appear useless as a predictor for churn status.

As we can see the plot sepaated to four plots so we ca divide this feature to groups and see the results

So here we have ony 16% data samples represent churn customers , so we need to upsample the churn samples to match them with regular customer sample size to give the later selected models the chance to return a better results

Data preprocessing

Now we did one hot encode for all the categorical features describing different statuses of a customer.

Data Upsampling

Dimensional Reduction

We will use principal component analysis to reduce the dimensionality of the one-hot encoded categorical variables losing some of the variances, but simultaneously, using a couple of principal components instead of tens of one-hot encoded features will help me construct a better model.

Model Selection and Evaluation

Cross Validation

Training

Model Evaluation On Original Data (Before Upsampling)

Conclusion